Adequate sample size for developing prediction models is not simply related to events per variable

نویسندگان

  • Emmanuel O. Ogundimu
  • Douglas G. Altman
  • Gary S. Collins
چکیده

OBJECTIVES The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated. STUDY DESIGN AND SETTING We conducted an extended resampling study using a large general-practice data set, comprising over 2 million anonymized patient records, to examine the EPV requirements for prediction models with low-prevalence binary predictors developed using Cox regression. The performance of the models was then evaluated using an independent external validation data set. We investigated both fully specified models and models derived using variable selection. RESULTS Our results indicated that an EPV rule of thumb should be data driven and that EPV ≥ 20 ​ generally eliminates bias in regression coefficients when many low-prevalence predictors are included in a Cox model. CONCLUSION Higher EPV is needed when low-prevalence predictors are present in a model to eliminate bias in regression coefficients and improve predictive accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Non-Parametric Prediction Intervals for Order Statistics with Random Sample Size

‎ In many experiments, such as biology and quality control problems, sample size cannot always be considered as a constant value. Therefore, the problem of predicting future data when the sample size is an integer-valued random variable can be an important issue. This paper describes the prediction problem of future order statistics based on upper and lower records. Two different cases for the ...

متن کامل

Trustworthy or flawed clinical prediction rule?

We read with interest the recently published paper by Hilder et al. [1], where the authors present the PRESET-Score, a new clinical prediction rule for patients with acute respiratory distress syndrome treated with extracorporeal membrane oxygenation (ECMO). While the topic is clinically relevant and interesting, we are worried that spurious findings, biased results, and overstated findings are...

متن کامل

Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models

We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients h...

متن کامل

A Nonlinear Model of Economic Data Related to the German Automobile Industry

Prediction of economic variables is a basic component not only for economic models, but also for many business decisions. But it is difficult to produce accurate predictions in times of economic crises, which cause nonlinear effects in the data. Such evidence appeared in the German automobile industry as a consequence of the financial crisis in 2008/09, which influenced exchange rates and a...

متن کامل

Model Selection for Mixture Models Using Perfect Sample

We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 76  شماره 

صفحات  -

تاریخ انتشار 2016